The data in this report is the Electricity Consumption and Occupancy (ECO) dataset (Beckel et al. (2014), Kleiminger, Beckel, and Santini (2015)). This dataset contains two parts: plug data and smart meter data. The plug data describes the energy consumption by second for each day by various appliances. The smart meter data is the energy consumption measured by three-phase smart meter for entire household. There are three households in the dataset: 04, 05, and 06. This report will show the household usage similarity and differences over a day, including the total usage and by different appliances. Thus, the households and their related total and average usage by hour from both plug data and smart meter data will be used in this report.
More specifically, the plug dataset is used to calculate the average 10-minutes usage for the first 14 days in the dataset. The usage from different household and different phase is recorded. The smart meter dataset is used to obtain the hourly usage averaged by all date in the dataset grouping by households and appliances.
Data-science questions
The main question is household usages similarity and differences over a day. This report is a demonstration to show how usage can be similar or different by households. If there are large scale household data, this will help to understand the overall usage pattern, so that utility provider can make plans for the adjustment when necessary, i.e. utility distribution during the peak hour.
More specific questions related to the individual visualizations are described as follow.
How similar/difference is the average usage distributed by time for different households? For example, is there any peak time that usage from all households are high?
How similar/difference is the average usage by appliances by different households throughout the day?
Data preparation and EDA
EDA
This session performs Exploratory Data Analysis (EDA) with necessary data preparation.
Plug data
Load 04 household to explore.
Code
## read data## plug 04 household## Measurement period:## 27.06.12 to 23.01.13## 01: Fridge (no. days: 194, coverage: 97.01%)## 02: Kitchen appliances (no. days: 194, coverage: 96.81%) (*)## 03: Lamp (no. days: 170, coverage: 93.54%) (**)## 04: Stereo and laptop (no. days: 169, coverage: 90.98%)## 05: Freezer (no. days: 192, coverage: 93.08%)## 06: Tablet (no. days: 189, coverage: 93.6%)## 07: Entertainment (no. days: 186, coverage: 94.69%) (***)## 08: Microwave (no. days: 194, coverage: 97.08%)appliance <-c("01", "02", "03", "04", "05", "06", "07", "08")for (a in appliance) { file_list <-list.files(path =paste0("../../eco/04_plug/", a),pattern =".csv",all.files =FALSE,full.names =FALSE )## initiate variable name for each applicance df_name <-paste0("p04_", a)for (file in file_list) { path <-paste0("../../eco/04_plug/", a, "/", file)# if the merged dataset doesn't exist, create itif (!exists(df_name)) { temp <-read.table(path, header =FALSE)colnames(temp) <-substr(file, 1, 10)assign(df_name, temp)rm(temp)next }# if the merged dataset does exist, append to itif (exists(df_name)) { temp_dataset <-read.table(path, header =FALSE)colnames(temp_dataset) <-substr(file, 1, 10) temp <-cbind(get(df_name), temp_dataset)assign(df_name, temp)rm(temp_dataset, temp) } }}
The plug data for 04 household solely took 1G ram, ~1 mins to read. This indicate that the data is needed to be aggregated in some way to plot.
Plot 04 household Fridge data on 2012-06-27 as example. There is not much information from the plot.
Code
# sample plotplot(p04_01$`2012-06-27`, type ="l", main ="Fridge plug data on 2012-06-27")
To compare, next to plot Lamp data on 2012-06-27.
Code
# sample plotplot(p04_02$`2012-06-27`, type ="l", main ="Lamp plug data on 2012-06-27")
There is clear difference in the pattern of different appliance.
Load 05 household to explore to see if there is any difference or similarity.
Code
## read data## plug 05 household# Measurement period:# 27.06.12 to 31.01.13## 01: Tablet (no. days: 218, coverage: 97.87%)# 02: Coffee machine (no. days: 218, coverage: 95.16%)# 03: Fountain (no. days: 71, coverage: 99.43%) (*)# 04: Microwave (no. days: 218, coverage: 97.87%)# 05: Fridge (no. days: 218, coverage: 97.87%)# 06: Entertainment (no. days: 192, coverage: 89.14%) (**)# 07: PC (no. days: 218, coverage: 97.87%) (***)# 08: Kettle (no. days: 25, coverage: 76.64%)appliance <-c("01", "02", "03", "04", "05", "06", "07", "08")for (a in appliance) { file_list <-list.files(path =paste0("../../eco/05_plug/", a),pattern =".csv",all.files =FALSE,full.names =FALSE )## initiate variable name for each applicance df_name <-paste0("p05_", a)for (file in file_list) { path <-paste0("../../eco/05_plug/", a, "/", file)# if the merged dataset doesn't exist, create itif (!exists(df_name)) { temp <-read.table(path, header =FALSE)colnames(temp) <-substr(file, 1, 10)assign(df_name, temp)rm(temp)next }# if the merged dataset does exist, append to itif (exists(df_name)) { temp_dataset <-read.table(path, header =FALSE)colnames(temp_dataset) <-substr(file, 1, 10) temp <-cbind(get(df_name), temp_dataset)assign(df_name, temp)rm(temp_dataset, temp) } }}
Plot 05 household Fridge data on 2012-06-27 as same as 04 household. The difference is obvious.
Code
# sample plotplot(p05_01$`2012-06-27`, type ="l", main ="Fridge plug data on 2012-06-27")
Plot Lamp data on 2012-06-27 for 05 household.
Code
# sample plotplot(p05_02$`2012-06-27`, type ="l", main ="Lamp plug data on 2012-06-27")
Smart meter data
The concept of three-phase smart meter is explained here. We are going to look at the total and each phase power.
Code
## read data## plug 04 household# SMART METER DATA# ----------------## Measurement period:# 27.06.12 to 31.01.13## Coverage:# No. days: 219, Coverage: 99.39%file_list <-list.files(path =paste0("../../eco/04_sm"),pattern =".csv",all.files =FALSE,full.names =FALSE )for (file in file_list) { path <-paste0("../../eco/04_sm/", file)for (phase in0:3) {# if the merged dataset doesn't exist, create it## initiate variable name for each phase df_phase <-paste0("s04_", phase)if (!exists(df_phase)) { temp <-read.csv(path, header =FALSE) temp <-data.frame(temp[, phase +1])colnames(temp) <-substr(file, 1, 10)assign(df_phase, temp)rm(temp)next }# if the merged dataset does exist, append to itif (exists(df_phase)) { temp_dataset <-read.csv(path, header =FALSE) temp_dataset <-data.frame(temp_dataset[, phase +1])colnames(temp_dataset) <-substr(file, 1, 10) temp <-cbind(get(df_phase), temp_dataset)assign(df_phase, temp)rm(temp_dataset, temp) } }}
Reading the data cost several minutes. This indicate that the data need to be aggregate or processed in some way.
Plot 2012-06-27 for all phases.
Code
par(mar=c(2.5,2.5,1,1))layout(matrix(c(1,2,3,4,5),ncol=1),heights=c(2,2,2,2,2))plot.new()text(0.5,0.5,"04 Smart meter power on 2012-06-27")plot(s04_0$`2012-06-27`, type ="l", main ="All phase", col ="red")plot(s04_1$`2012-06-27`, type ="l", main ="Phase 1",col ="blue")plot(s04_2$`2012-06-27`, type ="l", main ="Phase 2",col ="green")plot(s04_3$`2012-06-27`, type ="l", main ="Phase 3",col ="black")
Compare for a different household.
Code
## read data## plug 05 household# SMART METER DATA# ----------------# # Measurement period:# 27.06.12 to 31.01.13# # Coverage:# No. days: 215, Coverage: 99.05%file_list <-list.files(path =paste0("../../eco/05_sm"),pattern =".csv",all.files =FALSE,full.names =FALSE )for (file in file_list) { path <-paste0("../../eco/05_sm/", file)for (phase in0:3) {# if the merged dataset doesn't exist, create it## initiate variable name for each phase df_phase <-paste0("s05_", phase)if (!exists(df_phase)) { temp <-read.csv(path, header =FALSE) temp <-data.frame(temp[, phase +1])colnames(temp) <-substr(file, 1, 10)assign(df_phase, temp)rm(temp)next }# if the merged dataset does exist, append to itif (exists(df_phase)) { temp_dataset <-read.csv(path, header =FALSE) temp_dataset <-data.frame(temp_dataset[, phase +1])colnames(temp_dataset) <-substr(file, 1, 10) temp <-cbind(get(df_phase), temp_dataset)assign(df_phase, temp)rm(temp_dataset, temp) } }}
Plot 2012-06-27 for all phases for 05 household.
Code
par(mar=c(2.5,2.5,1,1))layout(matrix(c(1,2,3,4,5),ncol=1),heights=c(2,2,2,2,2))plot.new()text(0.5,0.5,"04 Smart meter power on 2012-06-27")plot(s05_0$`2012-06-27`, type ="l", main ="All phase", col ="red")plot(s05_1$`2012-06-27`, type ="l", main ="Phase 1",col ="blue")plot(s05_2$`2012-06-27`, type ="l", main ="Phase 2",col ="green")plot(s05_3$`2012-06-27`, type ="l", main ="Phase 3",col ="black")
Data preparation
This session describes the data preparation for results analysis and visualization.
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Attaching package: 'data.table'
The following objects are masked from 'package:dplyr':
between, first, last
Plug data
The steps and code below describe the preparation of daily average consumption by household data.
Read data
Code
## read data## plug 04 household## Measurement period:## 27.06.12 to 23.01.13## 01: Fridge (no. days: 194, coverage: 97.01%)## 02: Kitchen appliances (no. days: 194, coverage: 96.81%) (*)## 03: Lamp (no. days: 170, coverage: 93.54%) (**)## 04: Stereo and laptop (no. days: 169, coverage: 90.98%)## 05: Freezer (no. days: 192, coverage: 93.08%)## 06: Tablet (no. days: 189, coverage: 93.6%)## 07: Entertainment (no. days: 186, coverage: 94.69%) (***)## 08: Microwave (no. days: 194, coverage: 97.08%)appliance <-c("01", "02", "03", "04", "05", "06", "07", "08")for (a in appliance) { file_list <-list.files(path =paste0("../../eco/04_plug/", a),pattern =".csv",all.files =FALSE,full.names =FALSE )## initiate variable name for each applicance df_name <-paste0("p04_", a)for (file in file_list) { path <-paste0("../../eco/04_plug/", a, "/", file)# if the merged dataset doesn't exist, create itif (!exists(df_name)) { temp <-read.table(path, header =FALSE)colnames(temp) <-substr(file, 1, 10)assign(df_name, temp)rm(temp)next }# if the merged dataset does exist, append to itif (exists(df_name)) { temp_dataset <-read.table(path, header =FALSE)colnames(temp_dataset) <-substr(file, 1, 10) temp <-cbind(get(df_name), temp_dataset)assign(df_name, temp)rm(temp_dataset, temp) } }}## read data## plug 05 household# Measurement period:# 27.06.12 to 31.01.13## 01: Tablet (no. days: 218, coverage: 97.87%)# 02: Coffee machine (no. days: 218, coverage: 95.16%)# 03: Fountain (no. days: 71, coverage: 99.43%) (*)# 04: Microwave (no. days: 218, coverage: 97.87%)# 05: Fridge (no. days: 218, coverage: 97.87%)# 06: Entertainment (no. days: 192, coverage: 89.14%) (**)# 07: PC (no. days: 218, coverage: 97.87%) (***)# 08: Kettle (no. days: 25, coverage: 76.64%)appliance <-c("01", "02", "03", "04", "05", "06", "07", "08")for (a in appliance) { file_list <-list.files(path =paste0("../../eco/05_plug/", a),pattern =".csv",all.files =FALSE,full.names =FALSE )## initiate variable name for each applicance df_name <-paste0("p05_", a)for (file in file_list) { path <-paste0("../../eco/05_plug/", a, "/", file)# if the merged dataset doesn't exist, create itif (!exists(df_name)) { temp <-read.table(path, header =FALSE)colnames(temp) <-substr(file, 1, 10)assign(df_name, temp)rm(temp)next }# if the merged dataset does exist, append to itif (exists(df_name)) { temp_dataset <-read.table(path, header =FALSE)colnames(temp_dataset) <-substr(file, 1, 10) temp <-cbind(get(df_name), temp_dataset)assign(df_name, temp)rm(temp_dataset, temp) } }}## read data## plug 06 household# PLUG DATA# ---------## Measurement period:# 27.06.12 to 31.01.13## 01: Lamp (no. days: 166, coverage: 67.2%)# 02: Laptop (no. days: 185, coverage: 97.3%) (*)# 03: Router (no. days: 88, coverage: 96.73%) (**)# 04: Coffee machine (no. days: 179, coverage: 86.03%)# 05: Entertainment (no. days: 181, coverage: 95.86%) (***)# 06: Fridge (no. days: 179, coverage: 95.78%)# 07: Kettle (no. days: 147, coverage: 82.54%)appliance <-c("01", "02", "03", "04", "05", "06", "07")for (a in appliance) { file_list <-list.files(path =paste0("../../eco/06_plug/", a),pattern =".csv",all.files =FALSE,full.names =FALSE )## initiate variable name for each applicance df_name <-paste0("p06_", a)for (file in file_list) { path <-paste0("../../eco/06_plug/", a, "/", file)# if the merged dataset doesn't exist, create itif (!exists(df_name)) { temp <-read.table(path, header =FALSE)colnames(temp) <-substr(file, 1, 10)assign(df_name, temp)rm(temp)next }# if the merged dataset does exist, append to itif (exists(df_name)) { temp_dataset <-read.table(path, header =FALSE)colnames(temp_dataset) <-substr(file, 1, 10) temp <-cbind(get(df_name), temp_dataset)assign(df_name, temp)rm(temp_dataset, temp) } }}
Replace negative values (missing values) as NA
Code
households <-c("04", "05", "06")appliance <-c("01", "02", "03", "04", "05", "06", "07", "08")for (household in households) {for (a in appliance) {if (!(household =="06"&& a =="08")) { df <-get(paste0("p", household, "_", a)) df <-replace(df, df <0, NA)assign(paste0("p", household, "_", a),df) } }}
Compute daily average hourly consumption by each household and each appliance.
Code
daily_consum <-function(household) {# create empty df for res daily_power <-data.frame(matrix(ncol =5, nrow =0))#provide column namescolnames(daily_power) <-c('household', 'daily_avg', 'appliance','hour','hour_avg')for (a in appliance) {if (household !="06") { df <-get(paste0("p", household, "_", a))## compute hourly average through out all the days having data hour_avg <-rowMeans(aggregate( df,list(rep(1:(nrow(df) %/%3600+1),each =3600,len =nrow(df) )), sum,na.rm =TRUE,na.action =NULL )[-1]/3600000) curr_a_power <-data.frame(household =rep(household, 24),daily_avg =rep(NA, 24),appliance =rep(a, 24),hour =seq.int(24),hour_avg = hour_avg ) }else {if (a !="08") { df <-get(paste0("p", household, "_", a)) hour_avg <-rowMeans(aggregate( df,list(rep(1:(nrow(df) %/%3600+1),each =3600,len =nrow(df) )), sum,na.rm =TRUE,na.action =NULL )[-1]/3600000) curr_a_power <-data.frame(household =rep(household, 24),daily_avg =rep(NA, 24),appliance =rep(a, 24),hour =seq.int(24),hour_avg = hour_avg ) }else {return(daily_power)} } daily_power <-rbind(daily_power, curr_a_power) }return(daily_power)}daily_power <-bind_rows(lapply(households, daily_consum), .id ="column_label")
The Plotly (Figure 7) is used to generate the visualization of the first 14 days 10-minutes energy usage by different households and different phases. The Plotly drop-down and slider features can be used to filter households and the date. The drop-down menu of Plotly is more flexible compared to the dropdown filter in Altair. There are three households. The number of drop-down menu options can be anything, but better with less options other than more. The slider, on the other hand, can handle more traces. Thus, I used drop-down menu to filter the households, while the slider is the filter of 14 dates. I aggregated the secondly data to 10-minutes, so that the pattern is more obvious. Moreover, I included a spike line and tooltip making the comparison among phases and the total on the same time point easier.
The Altair (?@fig-plotly) is used to generate the visualization of the comparison of total and individual appliance usage by household through out the day. I choose to use the combination of barplot-pie chart-barplot to show the interaction of different household can have different appliance usage at various time. When use select a household or all of the households in the first bar plot, the pie chart will show the usage for that household/total usage by appliance. I also included a tooltip to show the daily total usage by those appliances. In this way, user can see how the appliance usage different from households. Moreover, I designed a second selection as the appliance. When user click the pie chart for particular appliance, the third chart, bar plot, shows the average hourly usage by selected appliance for that household. I have considered using a stacked barplot to combied the first two plots. However, I think either using the normalized bar plot or the standard stacked bar plot, there are some information lost. For example, using normalized bar plot lost the sense of the comparison among total usage of different households, while using standard stacked bar plot making the comparison among appliance usage across different households hard.
Visualizations
Code
import numpy as npimport pandas as pdimport altair as altimport datetime as dtimport plotly.io as pioimport plotly.express as pximport plotly.offline as pyoimport plotly.graph_objects as gofrom plotly.subplots import make_subplotspio.renderers.default ="plotly_mimetype+notebook_connected"# pyo.init_notebook_mode()# import plotly.io as pio#pio.renderers.default = 'iframe_connected'
Plotly with dropdown menu and slider
dropdown menu as the households, and slider as the date (first 2 weeks of the data)
Figure 1: 10-minutes energy consumption for first 14 days
Code
steps = []# for each year and each continentfor i inrange(3*14): label = [str(s) for s in ten_min_day['date'].dt.strftime('%Y-%m-%d(%a)').unique()] step =dict( method="update", label=label[i %14], args=[{"visible": [False]*3*14*4}, {"title": "Daily total consumption: "+str("%.2f"% ten_min_day.groupby(['date','household']).agg({'value': 'sum'})['value'][i]) +" kW.h"}],# layout attribute ) step["args"][0]["visible"][i*4] =True step["args"][0]["visible"][i*4+1] =True step["args"][0]["visible"][i*4+2] =True step["args"][0]["visible"][i*4+3] =True# Toggle i'th trace to "visible" steps.append(step)sliders = []for i in np.arange(0, 3*14, 14): slider = [dict( active=0, currentvalue={"prefix": "Date: " }, pad={"t": 50}, steps=steps[i:i+14], )] sliders.append(slider)# INITIALIZE# MAKE FIRST TRACE VISABLEfig.data[0].visible =Truefig.data[1].visible =Truefig.data[2].visible =Truefig.data[3].visible =Truefig.update_layout(sliders=sliders[0])# ADD DROPDOWN TO CHANGE TYPE# "restyle": modify data or data attributes# "relayout": modify layout attributes# "update": modify data and layout attributes# "animate": start or pause an animation
Figure 2: 10-minutes energy consumption for first 14 days
Code
fig.update_layout( title=dict(text="Smart power 10-mins usage by household", font=dict(size=25), yref='paper'), legend_title="Phase", xaxis_title="Time", yaxis_title="Energy consumption, kW.h", plot_bgcolor="floralwhite", showlegend=True, hovermode='x')
Figure 3: 10-minutes energy consumption for first 14 days
Code
fig.update_xaxes(tickangle=-45, showspikes=True, spikemode='across', spikesnap='cursor', showline=True, showgrid=True, spikethickness=1, )# IMPORTANT WITH MULTIPLE ARGS YOU NEED EACH IN A DICTIONARY FOR SOME REASON
Figure 4: 10-minutes energy consumption for first 14 days
Figure 5: 10-minutes energy consumption for first 14 days
Code
fig.show()# import plotly as plt#fig.show(renderer="notebook_connected")# # Plot it and save as basic-line.html# plt.offline.plot(fig, filename = 'Test-1.html', auto_open = False)# # Show HTML# from IPython.display import HTML# HTML(filename='Test-1.html')
Figure 6: 10-minutes energy consumption for first 14 days
### first plot show the the average daily consumption selection_bar = alt.selection_single(fields=['household'],name='House')color_bar = alt.condition(selection_bar, alt.value('sienna'), alt.value('lightgray'))bar=(alt.Chart(hourly_app) .mark_bar(size=30) .encode(y='daily_avg:Q', x='household:N', color=color_bar ))text_bar = bar.mark_text( align='center', baseline='bottom', dx=1# Nudges text to right so it doesn't appear on top of the bar).encode( text=alt.Text("daily_avg:Q", format=",.2f"))bar.title =["Daily average energy consumption","by household"]bar.encoding.x.title ='Households'bar.encoding.y.title ='Daily consumption, kW.h'#(bar + text_bar).properties(width=200).add_selection(selection_bar)
Code
# pie chart to see the appliance use percentageselection_app = alt.selection_single(encodings=['theta','color'],name='app')selection_app1 = alt.selection_single(fields=['appliance_name'],name='app1')# color_app = alt.condition(selection_app,# 'appliance_name',# alt.value('lightgray'))opacity_app = alt.condition(selection_app, alt.value(1.0), alt.value(0.2))base = alt.Chart(hourly_app).transform_filter(selection_bar).encode( theta=alt.Theta("usage:Q", stack=True), color=alt.Color('appliance_name:N', legend=None), opacity=opacity_app, tooltip='combined_tooltip:N',).transform_aggregate( usage ='sum(hour_avg)', groupby=["appliance_name"]).transform_calculate( round_usage ='round(datum.usage*1000)/1000', combined_tooltip ="'Usage: ' + datum.round_usage + ' kW.h'" )pie = base.mark_arc(outerRadius=120)text_pie = base.mark_text(baseline="bottom",radius=120, size=12,lineBreak=" ",radiusOffset=22 ).encode(text="appliance_name:N" )# pie plot titlepie.title =["Usage by appliances"]#(bar + text_bar).properties(width=200,height=310).add_selection(selection_bar) | (pie + text_pie).properties(height=310).add_selection(selection_app).add_selection(selection_app1)
Code
## barplot show usage for every hourcolor_hour = alt.condition(selection_app, alt.Color('appliance_name:N'),# alt.value('steelblue'), alt.value('lightgray'))bar_hour=(alt.Chart(hourly_app) .mark_bar() .transform_filter(selection_bar ).transform_filter(selection_app1) ).encode(x=alt.X('hour:N'), y=alt.Y('hour_avg:Q'), color=alt.Color('appliance_name:N', legend=None) )bar_hour.title ="Appliance energy consumption"bar_hour.encoding.x.title ='Hour'bar_hour.encoding.y.title ='Energy consumption, kW.h'(bar + text_bar).properties(width=200,height=310).add_selection(selection_bar) | (pie + text_pie).properties(height=310).add_selection(selection_app).add_selection(selection_app1) | bar_hour.properties(height=310)
Figure 7: Appliance usage distribution by households
Reference
Beckel, Christian, Wilhelm Kleiminger, Romano Cicchetti, Thorsten Staake, and Silvia Santini. 2014. “The ECO Data Set and the Performance of Non-Intrusive Load Monitoring Algorithms.”Proceedings of the 1st ACM International Conference on Embedded Systems for Energy-Efficient Buildings (BuildSys 2014).
Kleiminger, Wilhelm, Christian Beckel, and Silvia Santini. 2015. “Household Occupancy Monitoring Using Electricity Meters.”Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2015).